Document image zone classification - a simple high-performance approach
نویسندگان
چکیده
We describe a simple, fast, and accurate system for document image zone classification — an important subproblem of document image analysis — that results from a detailed analysis of different features. Using a novel combination of known algorithms, we achieve a very competitive error rate of 1.46% (n = 13811) in comparison to (Wang et al., 2006) who report an error rate of 1.55% (n = 24177) using more complicated techniques. The experiments were performed on zones extracted from the widely used UW-III database, which is representative of images of scanned journal pages and contains ground-truthed real-world data.
منابع مشابه
Learning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملDOCUMENT IMAGE ZONE CLASSIFICATION A Simple High-Perfomance Approach
We describe a simple, fast, and accurate system for document image zone classification — an important subproblem of document image analysis — that results from a detailed analysis of different features. Using a novel combination of known algorithms, we achieve a very competitive error rate of 1.46% (n = 13811) in comparison to (Wang et al., 2006) who report an error rate of 1.55% (n = 24177) us...
متن کاملDocument Image Retrieval Based on Keyword Spotting Using Relevance Feedback
Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملDocument zone content classification and its performance evaluation
This paper describes an algorithm for the determination of zone content type of a given zone within a document image.We take a statistical based approach and represent each zone with 25 dimensional feature vectors. An optimized decision treeclassifier is used to classify each zone into one of nine zone content classes. A performance evaluation protocol is proposed.The training and t...
متن کامل